Non-Stationary Approximate Modified Policy Iteration
نویسندگان
چکیده
We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error at each iteration is known to lead to stationary policies that are at least 2γ (1−γ)2 -optimal. Variations of Value and Policy Iteration, that build `-periodic nonstationary policies, have recently been shown to display a better 2γ (1−γ)(1−γ`) -optimality guarantee. We describe a new algorithmic scheme, Non-Stationary Modified Policy Iteration, a family of algorithms parameterized by two integers m ≥ 0 and ` ≥ 1 that generalizes all the above mentionned algorithms. While m allows one to interpolate between Value-Iteration-style and Policy-Iteration-style updates, ` specifies the period of the non-stationary policy that is output. We show that this new family of algorithms also enjoys the improved 2γ (1−γ)(1−γ`) -optimality guarantee. Perhaps more importantly, we show, by exhibiting an original problem instance, that this guarantee is tight for all m and `; this tightness was to our knowledge only known in two specific cases, Value Iteration (m = 0, ` = 1) and Policy Iteration (m =∞, ` = 1).
منابع مشابه
Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies
We consider approximate dynamic programming for the infinite-horizon stationary γ-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal policy that is stationary, we show that when using value function approximation, looking for a non-stationary policy may lead to a better performance guarantee. We def...
متن کاملOn Approximate Stationary Radial Solutions for a Class of Boundary Value Problems Arising in Epitaxial Growth Theory
In this paper, we consider a non-self-adjoint, singular, nonlinear fourth order boundary value problem which arises in the theory of epitaxial growth. It is possible to reduce the fourth order equation to a singular boundary value problem of second order given by w''-1/r w'=w^2/(2r^2 )+1/2 λ r^2. The problem depends on the parameter λ and admits multiple solutions. Therefore, it is difficult to...
متن کاملOn the Performance Bounds of some Policy Search Dynamic Programming Algorithms
We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI) scheme via an -approximate greedy operator (Kakade and Langford, 2002; Lazaric et al., 2010). We describe existing and a few new performance bounds for Direc...
متن کاملApproximate Dynamic Programming for Two-Player Zero-Sum Markov Games
This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we ca...
متن کاملSolving time-fractional chemical engineering equations by modified variational iteration method as fixed point iteration method
The variational iteration method(VIM) was extended to find approximate solutions of fractional chemical engineering equations. The Lagrange multipliers of the VIM were not identified explicitly. In this paper we improve the VIM by using concept of fixed point iteration method. Then this method was implemented for solving system of the time fractional chemical engineering equations. The ob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015